Goto

Collaborating Authors

 training iteration



Checklist

Neural Information Processing Systems

The checklist follows the references. Please read the checklist guidelines carefully for information on how to answer these questions. You are strongly encouraged to include a justification to your answer, either by referencing the appropriate section of your paper or providing a brief inline description. Please do not modify the questions and only use the provided macros for your answers. Note that the Checklist section does not count towards the page limit. In your paper, please delete this instructions block and only keep the Checklist section heading above along with the questions/answers below.


Baselines

Neural Information Processing Systems

As shown in the main text, under the assumption that the influence network is unbiased, our factor baselines are indeed valid control variates. We prove this result below, repeating the statement itself for posterity and providing a supplementary lemma on control variates as a restatement of known results. Let X, Y and Zbe random variables where the law of Xconditional on Z is denoted Pθ(X|Z), and Y is independent of X conditioned on Z; i.e. Then, we have that E[Y θln Pθ(X)] = 0. Proof. Factor baselines are valid control variates if GΣ is true to the MDP (i.e.


Appendix for " Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games " Table of Contents

Neural Information Processing Systems

A.1 Proof of Theorem 1 To prove Theorem 1, we need the help of the following Lemma See Proposition 7.1 in [3]. Now we can prove our Theorem 1. Proof. For games with only one step (normal-form games, functional-form games), there is only one fixed state. Therefore, the distribution of state-action is equivalent to the distribution of the action. A.2 Proof of Theorem 2 Let us restate our Theorem 2 Theorem 2. For a given empirical payoff matrix A RM N and the reward vector aM+1 for policy M + ||(I A>(A>))aM+1||2, (18) where (A>) is the Moore-Penrose pseudoinverse of A>, and σmin(A) is the minimum singular value of A. Proof. The last equation comes from the analytic calculation of min1>β=1 ||β (A>) aM+1||2 using Lagrangian.






Disentangled behavioural representations

Neural Information Processing Systems

Here, we show how to benefit from the flexibility of RNNs while representing individual differences inalow-dimensional andinterpretable space. Toachievethis, wepropose anovelend-to-end learning frameworkinwhich an encoder istrained to map the behavior of subjects into alow-dimensional latent space.